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Abstract 

It is well known that the human visual system can reconstruct depth from simple 
random-dot displays given motion information. This fact has lent support to the 
notion that structure from stereo and motion systems rely on low-level primitives 
or tokens, such as edges, derived from image intensities. In contrast, the judgment 
of surface attributes such as transparency or opacity is often considered to be a 
higher-level visual process that would make use of low-level stereo or motion infor¬ 
mation, and perhaps attention or later recognition to tease apart the transparent 
from the opaque parts. This is exemplified by the lack of computational studies 
dealing with transparency, compared with the at least limited success of a number 
of algorithms to solve structure from motion or stereo. In this study, we describe a 
new illusion and some results that question the above view by showing that depth 
from transparency and opacity can override the rigidity bias in perceiving depth 
from motion. This provides support for the idea that the brain’s computation of 
the surface material attribute of transparency may have to be done either before, 
or in parallel with the computation of structure from motion. 
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1 Introduction 


One of the major challenges of vision research is to understand how the brain con¬ 
structs a model of the visual environment from the pattern of changing retinal light 
intensities. With relatively few exceptions (Poggio et al., 1988; Barrow and Tenenbaum, 
1978), computational research has sought to first divide the problem into modules such as 
surface-color-from-radiance, shape-from-shading, or structure-from-motion (Land, 1959; 
Horn, 1975; Ullman, 1979). A major result of these studies is that scene reconstruction 
from image data is often under-constrained—there are many solutions that satisfy the 
data. Prior constraints then have to be sought to find a unique interpretation of the 
environment from the image intensities. One promising avenue of research to reduce 
the strength of prior assumptions required is integration —the combination of visual in¬ 
formation from multiple sources, such as stereo and motion. Poggio (1985) proposed a 
theory based on a Bayesian approach that attempts to estimate the posterior probabil¬ 
ity of, say, depth, given all the data from different sensors and algorithms and a priori 
knowledge, embedded in an appropriate prior distribution. The theory assumes a specific 
model for the underlying probabilities, the MRF model, and uses a number of techniques 
-deterministic and stochastic - to estimate the appropriate quantities associated with 
the posterior probability, given the data, such as its maximizer or its mean(Little et al., 
1988). This theory formed the basis of the MIT Vision Machine project (see eg., (Poggio 
et al., 1990)). 

A second approach is cooperative coupling of the estimates of various scene attributes 
to achieve the consistency required by the laws of image formation.^ Consistent with the 
methodology of computer vision, current physiological, anatomical and psychophysical 
research indicates modular and concurrent processing, such as for motion, as distinct 
from form and color (Zeki and Shipp, 1978; Livingstone and Hubei, 1987; Cavanagh, 
1987). The number of distinct visual cortical areas is thought to be over twenty, each 
with a potentially different function, and with both feedforward and feedback connection 
between many of them (Essen, 1985). At this point, however, there are only vague 
ideas of the relationship between the processing streams in the brain, the modules of 
computational analysis, and perception as they pertain to integration and cooperative 
coupling of visual information. 

In contrast to the modularity of vision research, it is phenomenally apparent that 
visual information is eventually integrated to provide a strikingly singular description of 
the visual environment. The visual ambiguity one expects from weak prior constraints 
is the exception, rather than the rule. In the 19th century, Ernst Mach demonstrated 

^Cooperative coupling refers to the interaction between two perceptual representations of scene at¬ 
tributes (such as surface depth and reflectance) in order to satisfy a mutual consistency constraint usually 
imposed by how the image could be formed physically. The Mach card is an example of the cooperative 
coupling of perceived reflectance and relative depth. See D. J. Kersten, in “Computational Models of 
Visual Processing” M. Landy, A. Movshon, Eds. (M.I.T. Press, Cambridge, Massachusetts, 1991), and 
H. Bulthoff and A. Yuille, SPIE Visual Communication and Image Processing (1990) for a discussion of 
coupling of visual information. 
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that perceptual representations of the environment do interact in human perception and 
interact in such a way as to produce a consistent perception of the state of the scene 
that is unambiguous at a given moment, but bistable over time (Mach, 1959). In his 
well-known Mach-card illusion, the perceived surface color or lightness of a simple folded 
card, placed on a table, depends on light source direction, and the bistably perceived 
geometry of the card. We describe a new illusion, that like the Mach Card has a bistable 
3D interpretation; but the bistability is induced through motion parallax, and rather than 
interacting with the lightness of a surface, the perceived depth affects the phenomenal 
transparency.^ Using this stimulus, we have studied how the human perception of depth 
from motion interacts with the perceived surface attribute of opacity. 

It is well-known that motion provides information about relative depth relationships 
between surfaces in the world. Interactions between depth from motion and and other 
depth sources, such as proximity luminance, have been studied before (Dosher et al., 
1986). It has recently been discovered that degree of transparency determines whether 
two superimposed and independently moving square wave patterns are seen as moving in 
a single direction or in two independent directions (Ramachandran, 1989; Stoner et al., 
1990). Less well appreciated is that fact that transparency cues also provide depth 
information. Particular intensity relationships not only determine whether transparency 
is seen (Metelli, 1974; Beck et al., 1984), but also bias which of two overlapping surfaces 
is seen in front. We call this depth from transparency. Perception of transparency can 
lead to neon-color spreading, and loss of stereoscopic capture (Nakayama et al., 1989). 
It has also been shown that perception of incorrect depth from transparency can lead 
to a delay in seeing the correct depth relationships between surfaces based on stereo 
or motion information (Kersten et al., 1989). In this paper we specifically address the 
question: “When motion and transparency contradict, which takes precedence—motion 
or transparency information?” 

2 Method 

In an attempt to answer the above question, we simulated an object consisting of two 
square planar parallel surfaces that could rigidly rock back and forth about a vertical axis 
perpendicular to the line of site (Fig. 1). Animated sequences of images corresponding 
to a perspective view of two planar and possibly transparent faces (each a simulated 5 
X 5 cm square) were generated with a Macintosh II computer and displayed on a CRT 
monitor with a 256 gray-level capacity. The bias of the apparent depth of the two faces 
was controlled by motion and the intensity relations in the display that invoke various 
types of transparency. To provide motion information about the relative depths of the 
two faces, they were rocked back and forth rigidly about the vertical axis passing between 
the two surfaces and passing through a point equidistant to both. Like the Necker cube 

^Phenomenal transparency of a surface means we can see through it to another background surface. 
A perceptual consequence of phenomenal transparency is interpreting the transparent surface as being 
in front of the background. 
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which is an orthographic projection of a wire cube, a particular image frame can give rise 
to an ambiguous depth percept; the top face can appear in front or behind the bottom 
face. The bias of the apparent depth of the two faces was controlled by motion and 
transparency pattern. To provide motion information about the relative depths of the 
two faces, the planes oscillated sinusoidally back and forth by 40 deg about the vertical 
axis at 0.48 Hz. The distance between the point equi-distant between the two faces and 
the observer’s eye-point was 57 cm. There were 21 frames per period. The planes could 
be seen as square when in a head-on view, but typically appeared trapezoidal due to 
perspective. The top (or bottom) face, could either appear in front or behind the other. 
The depth relation seen depends on perceived transparency and motion. The particular 
intensity relationships of the four regions bias the apparent transparency of a face, and 
thus determine the relative depth of the front and back planes. The motion together 
with a bias toward rigidity also affects the depth one sees (Wallach and O’Connell, 1953; 
Ullman, 1979). Depth also depends on the a priori bias of the observer to see a a rigid 
body in perspective with the front face larger than the rear face, or alternatively, with 
the front face smaller than the rear face, but we do not study this here. 



Figure 1: Animated sequences of images corresponding to a perspective view of two rigidly 
coupled planar and possibly transparent faces were displayed on a 8-bit CRT monitor. The 
object was rocked back and forth rigidly about the vertical axis passing between the two surfaces 
and through a point equidistant to both. Like the Necker cube, a particular image frame can 
give rise to an ambiguous depth percept: the top face can appear in front of or behind the 
bottom face. 
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3 Basic Perceptual Phenomena 


In the following sections we will describe the basic perceptual phenomena, and then 
detail the results of some quantitative measurements. In all three of the demonstrations 
discussed below, the rigid motion is described as being consistent with the bottom face 
being in front of the top face and only the intensities of the various regions are changed. 
The basic phenomena are unaffected by placing the top face in front. 

3.1 Opaque Surfaces 

First we looked at the case in which both surfaces have zero transparency—that 
is, they are both opaque with the bottom square in front, and partially occluding the 
top (Fig. 2a). When the object was rocked back and forth, not surprisingly, observers 
saw rigid motion that was consistent with both the motion and occlusion cues. Next 
the intensities were adjusted so that the top patch appeared to occlude the bottom in 
contradiction to the rigid motion which indicated that the bottom square was in front. 
Occlusion completely inhibited the rigid interpretation, and we saw the two faces slipping 
and sliding over one another. This percept persists for many minutes. After awhile, some 
observers report that they can see the outside edges of the two surfaces move as if rigidly 
coupled if they consciously discount the “T” junctions indicating occlusion. Observers— 
seven out of seven informally queried as to whether they saw them or not—reported 
seeing weak, but definite subjective contours that complete the occluded square behind 
the center overlapping patch. Interestingly, these faint contours are visible even when 
nonrigid motion is seen, as if the occluding patch were transparent. 

3.2 Relaxed Occlusion 

Next we relaxed the occlusion cue, by adjusting the intensities of the patches so that 
one of the two faces appeared transparent. In one case, we adjusted the intensities so that 
either of the surfaces could appear to be a dark film lying over a light gray background, 
referred to below as a high contrast “dark/darker” condition (see Fig. 2f and Table 1). In 
this condition, even when the surfaces are stationary, the depth relations are ambiguous 
and bistable, in that either the top or bottom surface may appear in front in a stationary 
view. From a formal point of view, one might expect this when the image results from 
multiplying two source images. Multiplication is commutative, so there is no way to 
decide which surface is in front. It is curious to note that the plausible alternative of 
both surfaces being transparent is never reported. One can also adjust the intensities of 
the top and bottom squares to be equal in which case the only biases to favoring front 
are to prefer the bottom over the top, and the larger over the smaller (Fig. 2g). In either 
case, when the two planes were rocked back and forth, we saw a striking bistability. If 
the bottom face was seen in front in an initial static view, we saw both planes rigidly 
rocking back and forth with the bottom face appearing transparent, and the top face 
opaque. After watching this for anywhere between 2 to 30 seconds, suddenly the top face 


4 




a. occlusion 



b. light / contrast reduce c. contrast reduce / dark 



g. dark / dark 
(top-bottom-equal) 


Figure 2: Five transparency types were used to induce different strengths of depth-from- 
transparency cues in which the top-bottom squares could have the following effect on the 
intensities that they covered: dark/darker, contrast reduce/dark, light/dark, light/contrast 
reduce, lighter/light. These five types were built from permutations of four intensities : 16, 
26, 38 and 51 cd/m^ for a high contrast condition. We also tested responses to 5 low contrast 
versions of these five types, an occlusion case, and a balanced dark/dark condition in which the 
top and bottom were both equal in intensity was included (see Table 1). 


would appear in front and then the perceived motion was one of two faces slipping and 
sliding over each other. Simultaneous with this reversal of depth, there was an exchange 
of surface property—the top face now appeared transparent and the bottom opaque. The 
fact that these multistable percepts are still seen when the transparency cues to depth 
were exactly balanced (Fig. 2g) shows that a default assignment of relative depth (as 
with a stationary Necker cube) and transparency is made which interacts with depth 
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from motion. 


3.3 Diaphanous Transparency 

In a third demonstration, we sought a condition intermediate between the symmetric 
transparency of a “dark/dark” combination and complete occlusion by constructing a 
transparent overlay that appears diaphanous. A diaphanous transparent square has both 
additive and multiplicative components that, as shown below, bias its relative depth to 
be in front of the other square. This can be physically realized by a perforated screen 
whose holes are below the spatial resolution limit and which transmits a fraction of the 
light coming from behind, and reflects a fraction coming from the front (Richards and 
Witkin, 1979; Kersten, 1991). Consistent with the interpretation of a perforated screen, 
a film that reduces the contrast of the edges it overlays by lightening the darker region, 
and darkening the lighter, without changing contrast polarity tends to be seen in front 
(Fig. 2b,c). In the demonstration, the top square was made to appear contrast reducing. 
The bottom square was made to appear as a dark milky film behind the high contrast 
reducing top square (the high contrast “contrast reduce/dark” condition in Table 1, 
Fig. 2c). When the two faces were rocked back and forth, we saw the wrong motion. 
Just as in the case of occlusion, the surfaces appeared to slip nonrigidly over one another 
with the top face appearing in front. After several seconds of observation, suddenly rigid 
motion is seen at which time the top contrast reducing square is seen behind a dark 
bottom film. Again there was a simultaneous and unambiguous reversal of apparent 
transparency—the contrast reducing top square suddenly appeared opaque and behind 
a dark film at the bottom. 

4 Interaction between Transparency and Structure from Motion 

In order to quantify the interaction between transparency cues on depth and structure 
from motion, we made measurements of the reaction time to see rigid motion conditional 
on the perceived depth relations seen in an initial static view. The time to see rigid 
motion was measured in two basic conditions in which the initial depth perception, 
based on transparency, could either conflict {inconsistent conditions) or agree {consistent 
condition) with the subsequent 3D rigid motion. The experimental set-up was as before. 

By specifying the gray-levels of the four image regions, it was possible to control 
apparent transparency, and thus bias whether the top face or the bottom face appeared 
in front. We chose 12 different transparency types summarized in Table 1. The notion 
of the transparency type indicates how the top and bottom patches affect the brightness 
of the background. The first and second words on the label for a transparency type 
indicate how the top and bottom faces affect the brightness of the patches they cover, 
respectively. If both faces lighten the background, one of them still appears lighter and 
is indicated in the label. The same rule is used when both faces darken the background. 
For example, a “dark/darker” transparency means that both the top and bottom faces 
darkened what they cover, and that the bottom one was darker than the top. There 
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Transparency type 

Luminance cd/m^ 


Top 

Center 

Bottom 

Background 


top-bottom-equal dark/dark 

26 

16 

26 

51 


occlusion 

38 

16 

16 

51 

0 

dark/darker (HC) 

38 

16 

26 

51 

-24 

contrast reduce/dark (HC) 

38 

26 

16 

51 

24 

light/dark (HC) 

51 

26 

16 


24 

light/contrast reduce (HC) 

51 

38 

26 


19 

lighter/light (HC) 

38 

51 

26 

16 

32 

dark/darker (LC) 

38 

16 

19 

51 

-8.6 

contrast reduce/dark (LC) 

38 

26 

23 

51 

6.1 

light/dark (LC) 

51 

26 

23 

38 

6.1 

light/contrast reduce (LC) 

51 

38 

34 

16 

5.6 

lighter/light (LC) 

38 

51 

46 

16 

5.2 


Table 1: Intensity values of the center, bottom, top and background regions of the two planes 
are shown in cd/m?. HC and LC refer to high and low contrast conditions, respectively. 

are twenty four possible permutations, but these can be reduced to just six by excluding 
top/bottom symmetry and the physically implausible contrast reversing and contrast 
enhancing pairs. Of these six, two involve faces that both darkened the underlying 
surfaces, so one was eliminated, leaving five. In order to further increase the range of 
transparency types, we also added five stimuli in which the local edge contrast (Michelson 
contrast) of the lower right hand corner of the central patch was smaller. 

To understand our selection better, consider the top horizontal edge of the bottom 
patch of one of the transparencies in Figure 2. It crosses a vertical boundary of the 
top patch. If the bottom patch is not seen as a hole, the horizontal edge is attached 
to, or “intrinsic” to the bottom patch. This bottom film can either preserve or reverse 
the contrast polarity of the two regions separated by the vertical edge. A high contrast 
reversing surface does not in general appear transparent. Suppose the horizontal edge 
is contrast preserving. Then it can either lighten or darken the underlying regions, or it 
can reduce or enhance the contrast at the vertical edge. When the horizontal edge of a 
neutral density filter with transmittance less than 100% crosses the vertical boundary, 
it darkens the intensity on both sides of this edge (see “dark/darker” condition). A 
purely positive additive transparency lightens both regions that it covers. Of particular 
interest here is an edge that reduces contrast in the sense that it lightens the darker of 
the two regions it covers, and darkens the lighter without reversing the contrast polarity 
(“contrast reducing” condition). If the horizontal edge reduces contrast, there must be a 
vertical edge that darkens both regions while reversing contrast. Further, the horizontal 
edge, if considered attached to the top region, is contrast enhancing in the sense of 


7 











































darkening the darker of two regions it covers, and lightening the lighter without changing 
contrast polarity. Surfaces attached to contrast enhancing edges are not likely to be seen 
as transparent surface discontinuities. This provides a cue to edge attachment, and thus 
occlusion. 

4.1 Perceptual biases 

We wanted to find out how the degree of bias to see a particular surface as transparent 
would affect the time to see rigid motion when the motion either agreed or disagreed with 
the depth from transparency cues. 

In order to increase the number of stimuli, we included the five additional trans¬ 
parencies, similar to those in Figure 2 in which the local edge contrast of the lower right 
hand corner of the central patch was smaller. The high and low contrast groups had 
contrasts whose absolute values were above 19% and below 8.6%, respectively. On half 
of the trials, the top face was in front of the bottom face (front-top), as defined by the 
subsequent motion, and on the other half of the trials, it was behind the bottom face 
(front-bottom). Further, because the perspective view made the image of the front patch 
larger than the back, the observers were shown the stimuli with the top and bottom in¬ 
tensities “normal” or “exchanged” for each of the front-top and front-bottom conditions. 
Subjects first viewed a static head-on view of the two faces from a distance of 57 cm. 
Because we could not guarantee, for example, that a given transparency condition would 
generate a consistent depth ordering, the observer was asked to indicate whether the top 
or bottom surface appeared in front by pushing a button. This button press also initiated 
the animation of the object. The subject was to push another button once rigid motion 
was seen. The time to see rigid motion was measured. There were 5 subjects, 1 of which 
was naive to psychophysical experiments. Each subject saw each stimulus eight times. 
The presentation order was randomized. 

A five way ANOVA on reaction times (subjects vs. normal/exchanged vs. front- 
top/front-bottom vs. contrast vs. transparency type) showed a significant three-way 
interaction between transparency type, normal/exchanged, and front-top/front-bottom 
factors {p < 0.0001) indicating that there was a preferred face to be seen in front in 
a static view that interacted with the subsequent motion. There was also a significant 
difference in the range of observers reaction times, between 0.5 and 3 seconds for one 
observer, and between 1 and 30 seconds for the second. There was no significant main 
effect of high vs. low contrast on the interaction. 

Figure 3 presents the main observation of reaction time for two observers in a simpler 
way by averaging the reaction times over conditions in which the depth from transparency 
is either consistent or inconsistent with depth from motion. Motion and transparency 
information could be consistent (or inconsistent) in two ways. For example, the trans¬ 
parency information could either indicate that the bottom square was in front when rigid 
motion concurred, or that it was behind when rigid motion concurred. Figure 3 shows 
that the reaction times were substantially longer when the transparency cues gave depth 
relations inconsistent with the subsequent rigid motion for all transparency conditions for 
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Figure 3: The times to see rigid motion of the front and back faces was measured for 12 
different opacity conditions are shown here for two observers. In all cases the time to see rigid 
motion when the initial static opacity or transparency cues indicated a relative depth that was 
inconsistent with the subsequent rocking motion was longer than when the cues were consistent. 
The transparency types are arranged from bottom to top in order of increasing likelihood that 
a particular plane consistently appears in front (or behind) the other face (see Table 2). 


two observers. We have tested 5 other observers on 15 other variations of transparency 
relations and this pattern of results has held for all—the consistent reaction times are 
shorter than the inconsistent times, although as in Figure 3 there are substantial indi¬ 
vidual differences in the values of the average times. 

There was also an effect of the type of transparency on the preferred depth relation 
seen. In Figure 4 the same data are replotted in a different way in order to visualize 
the gradual increase in the reaction time with the strength of inconsistency given by a 
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Subject=> 

dck 

Ic 

zl 

sk 

pm 

Tranparency type H 

Plane fa- 

% 

Plane fa- 

% 

Plane fa- 

% 

Plane fa- 

% 

Plane fa- 

% 


vored to 


vored to 


vored to 


vored to 


vored to 



be seen 


be seen 


be seen 


be seen 


be seen 



in front 


in front 


in front 


in front 


in front 


top-bottom-equal 

bigger 

69 

bigger 

75 

bigger 

56 

smaller 

56 

bigger 

63 

occlusion 

occluder 


occluder 

97 

occluder 

100 

occluder 

100 

occluder 

91 

dark / darker (HC) 

darker 

88 

neither 

50 

dark 

62 

darker 

53 

neither 

50 

cont. rd. / dark (HC) 

cont. rd. 

97 




97 

cont. rd. 

100 

cont. rd. 

97 

light / dark (HC) 

dark 

63 

light 

84 

light 

91 

light 

81 

light 

56 

light / cont. rd. (HC) 

cont. rd. 

100 

cont. rd. 






cont. rd. 

59 

lighter / light (HC) 





lighter 

66 



lighter 

53 

dark / darker (LC) 

darker 

84 

darker 

72 

dark 

69 



darker 

69 

cont. rd. / dark (LC) 

cont. rd. 







62 

neither 

50 

light / dark (LC) 

dark 

100 

dark 

94 

dark 

72 

dark 

72 

dark 

78 

light / cont. rd. (LC) 

cont. rd. 

100 



cont. rd. 

■ 


m 



lighter / light (LC) 

light 

78 

light 

84 

lighter 

62 



lighter 

78 


Table 2: The face-in-front bias for different transparency types is shown for five subjects. The 
bias is measured as the percentage of time a particular face appears in front in a static view. 

face-in-front bias. This bias is the proportion of times a particular face was perceived in 
front in the initial static view. Apart from occlusion and contrast-reducing transparency, 
there was no general rule to predict the face-in-front bias across observers. However, in 
all four of the contrast reducing conditions, the contrast reducing face appeared in front 
of any other type of face in the initial static view at least 50% of the time, or more (Table 
2). In two of the conditions (“light/contrast reducing” and “contrast reducing/ dark”) 
the probability of seeing the contrast reducing face in front was 97% or more for all five 
observers^ 

5 Discussion 

Evidence has been presented elsewhere that surface occlusion information may be 
represented early in the visual system. In particular, occlusion can override stereo (Ra- 

®The probability was estimated by averaging over 16 presentations each for all five observers. 
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Figure 4: Mean time (±SEM) to see rigid motion plotted against the face-in-front bias for two 
observers. The face-in-front bias is the proportion of times a particular face appeared in front 
in the initial static view. Results from 12 transparency conditions are plotted. Each point is 
the mean of 16 measurements, averaged over conditions in which the top and bottom intensities 
were exchanged. 

machandran and Cavanagh, 1985), raise recognition performance for faces (Nakayama 
et al., 1988), and affect motion perception (Shimojo et al., 1989). Our results are con¬ 
sistent with the idea that the determination of what regions the boundary of a surface 
belongs (i.e. intrinsic or extrinsic edges) is done early. We add to this that the attach¬ 
ment of an edge to a region is influenced by transparency, and is also done early enough 
to affect the perceived relative motion between two surfaces. 

Computational vision research has underlined the importance of questions of repre¬ 
sentation, modularity and algorithm (Marr, 1982). In addition, we need to know what to 
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compute when. The striking bistability of the perceived motion together with the quan¬ 
titative increase in reaction time when motion and transparency cues are in opposition 
strongly suggest that surface transparency and relative depth are explicitly represented 
in the brain, and that they are computed cooperatively, rather than in strict sequence. 
These results point to central problems of depth integration and representation, and co¬ 
operative computation of multiple scene attributes. In previous studies (Biilthoff and 
Mallot, 1987; Biilthoff and Mallot, 1988), depth from shading and stereo was shown 
to accumulate, gradually increasing the perceived curvature of a smooth convex surface 
when the cues were consistent. As here, however, inconsistent cues were not resolved by 
averaging. One could imagine an accumulation of depth from transparency—a gradual 
increase in the contrast reduction of a planar surface mixing with the depth from mo¬ 
tion to produce an intermediate relative depth. But this does not happen. The perceived 
depth is fixed until suddenly it flips. What kind of mechanism can explain this? One way 
of viewing multistability is in terms of the brain constructing an a posteriori probability 
of the world’s state of affairs conditional on the image data (Kersten, 1991). Multista¬ 
bility is reflected in multiple modes of the probability distribution. This formulation, 
however, does not answer the mystery of how the switch is made from one mode to the 
next. A number of the properties of simulated neural-like networks parallel properties of 
perceptual multistability (Kawamoto and Anderson, 1985), but whether this is how the 
computation is realized in the brain remains a challenging problem for the future. 
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